259 research outputs found
Efficient Joinable Table Discovery in Data Lakes: A High-Dimensional Similarity-Based Approach
Finding joinable tables in data lakes is key procedure in many applications
such as data integration, data augmentation, data analysis, and data market.
Traditional approaches that find equi-joinable tables are unable to deal with
misspellings and different formats, nor do they capture any semantic joins. In
this paper, we propose PEXESO, a framework for joinable table discovery in data
lakes. We embed textual values as high-dimensional vectors and join columns
under similarity predicates on high-dimensional vectors, hence to address the
limitations of equi-join approaches and identify more meaningful results. To
efficiently find joinable tables with similarity, we propose a block-and-verify
method that utilizes pivot-based filtering. A partitioning technique is
developed to cope with the case when the data lake is large and the index
cannot fit in main memory. An experimental evaluation on real datasets shows
that our solution identifies substantially more tables than equi-joins and
outperforms other similarity-based options, and the join results are useful in
data enrichment for machine learning tasks. The experiments also demonstrate
the efficiency of the proposed method.Comment: Full version of paper in ICDE 202
DeepJoin: Joinable Table Discovery with Pre-trained Language Models
Due to the usefulness in data enrichment for data analysis tasks, joinable
table discovery has become an important operation in data lake management.
Existing approaches target equi-joins, the most common way of combining tables
for creating a unified view, or semantic joins, which tolerate misspellings and
different formats to deliver more join results. They are either exact solutions
whose running time is linear in the sizes of query column and target table
repository or approximate solutions lacking precision. In this paper, we
propose Deepjoin, a deep learning model for accurate and efficient joinable
table discovery. Our solution is an embedding-based retrieval, which employs a
pre-trained language model (PLM) and is designed as one framework serving both
equi- and semantic joins. We propose a set of contextualization options to
transform column contents to a text sequence. The PLM reads the sequence and is
fine-tuned to embed columns to vectors such that columns are expected to be
joinable if they are close to each other in the vector space. Since the output
of the PLM is fixed in length, the subsequent search procedure becomes
independent of the column size. With a state-of-the-art approximate nearest
neighbor search algorithm, the search time is logarithmic in the repository
size. To train the model, we devise the techniques for preparing training data
as well as data augmentation. The experiments on real datasets demonstrate that
by training on a small subset of a corpus, Deepjoin generalizes to large
datasets and its precision consistently outperforms other approximate
solutions'. Deepjoin is even more accurate than an exact solution to semantic
joins when evaluated with labels from experts. Moreover, when equipped with a
GPU, Deepjoin is up to two orders of magnitude faster than existing solutions
Self-paced Weight Consolidation for Continual Learning
Continual learning algorithms which keep the parameters of new tasks close to
that of previous tasks, are popular in preventing catastrophic forgetting in
sequential task learning settings. However, 1) the performance for the new
continual learner will be degraded without distinguishing the contributions of
previously learned tasks; 2) the computational cost will be greatly increased
with the number of tasks, since most existing algorithms need to regularize all
previous tasks when learning new tasks. To address the above challenges, we
propose a self-paced Weight Consolidation (spWC) framework to attain robust
continual learning via evaluating the discriminative contributions of previous
tasks. To be specific, we develop a self-paced regularization to reflect the
priorities of past tasks via measuring difficulty based on key performance
indicator (i.e., accuracy). When encountering a new task, all previous tasks
are sorted from "difficult" to "easy" based on the priorities. Then the
parameters of the new continual learner will be learned via selectively
maintaining the knowledge amongst more difficult past tasks, which could well
overcome catastrophic forgetting with less computational cost. We adopt an
alternative convex search to iteratively update the model parameters and
priority weights in the bi-convex formulation. The proposed spWC framework is
plug-and-play, which is applicable to most continual learning algorithms (e.g.,
EWC, MAS and RCIL) in different directions (e.g., classification and
segmentation). Experimental results on several public benchmark datasets
demonstrate that our proposed framework can effectively improve performance
when compared with other popular continual learning algorithms
Enhancement of Cement Paste with Carboxylated Carbon Nanotubes and Poly(Vinyl Alcohol)
Cement has been a major consumable material for construction in the world since its invention, but its low flexural strength is the main defect affecting the service life of structures. To adapt cement-based materials to a more stringent environment, carboxylated carbon nanotubes (CNTs-COOH) and poly(vinyl alcohol) (PVA) are proposed to enhance the mechanical properties of cement paste. This study systematically verifies the synergistic effect of CNTs-COOH/PVA on the performance of cement paste. First, UV-Vis spectroscopy and FTIR spectroscopy prove that CNTs-COOH can provide attachment sites for PVA and PVA can improve the dispersion and stability of CNTs-COOH in water, which demonstrates the feasibility of synergistically enhancing cement paste. When a 0.015% CNTs-COOH suspension with 0.1% PVA is added, the flexural strength of the cement paste increases by 73, 32, and 42% compared with control specimens at curing ages of 3, 7, and 28 days, respectively. The strength enhancement mechanism is revealed from the aspects of cement matrix enhancement and interface enhancement. Thermogravimetric (TG) analysis and mercury intrusion porosimetry (MIP) prove that CNTs-COOH can enhance the hydration degree of the cement matrix and fill the pores introduced by PVA. Based on the fact that PVA can improve the dispersibility and the nucleation site effect of CNTs-COOH in cement paste, molecular dynamics simulation confirms that PVA can bridge CNTs-COOH and C-S-H to enhance the interfacial bonding by 64.1%
Gait Cycle-Inspired Learning Strategy for Continuous Prediction of Knee Joint Trajectory from sEMG
Predicting lower limb motion intent is vital for controlling exoskeleton
robots and prosthetic limbs. Surface electromyography (sEMG) attracts
increasing attention in recent years as it enables ahead-of-time prediction of
motion intentions before actual movement. However, the estimation performance
of human joint trajectory remains a challenging problem due to the inter- and
intra-subject variations. The former is related to physiological differences
(such as height and weight) and preferred walking patterns of individuals,
while the latter is mainly caused by irregular and gait-irrelevant muscle
activity. This paper proposes a model integrating two gait cycle-inspired
learning strategies to mitigate the challenge for predicting human knee joint
trajectory. The first strategy is to decouple knee joint angles into motion
patterns and amplitudes former exhibit low variability while latter show high
variability among individuals. By learning through separate network entities,
the model manages to capture both the common and personalized gait features. In
the second, muscle principal activation masks are extracted from gait cycles in
a prolonged walk. These masks are used to filter out components unrelated to
walking from raw sEMG and provide auxiliary guidance to capture more
gait-related features. Experimental results indicate that our model could
predict knee angles with the average root mean square error (RMSE) of
3.03(0.49) degrees and 50ms ahead of time. To our knowledge this is the best
performance in relevant literatures that has been reported, with reduced RMSE
by at least 9.5%
Does temporary transfer to preoperative hemodialysis influence postoperative outcomes in patients on peritoneal dialysis? A retrospective cohort study
BackgroundThe associations between preoperative transfer to hemodialysis (HD) and postoperative outcomes in patients on chronic peritoneal dialysis (PD) remain unknown. We conducted this retrospective cohort study to investigate whether preoperative HD could influence surgical outcomes in PD patients undergoing major surgeries.MethodsAll chronic PD patients who underwent major surgeries from January 1, 2007, to December 31, 2020, at Peking University First Hospital were screened. Major surgery was defined as surgical procedures under general, lumbar or epidural anesthesia, with more than an overnight hospital stay. Patients under the age of 18, with a dialysis duration of less than 3 months, and those who underwent renal implantation surgeries and procedures exclusively aimed at placing or removing PD catheters were excluded. Patients involved were divided into either HD or PD group based on their preoperative dialysis status for further analysis.ResultsOf 105 PD patients enrolled, 65 continued PD, and 40 switched to HD preoperatively. Patients with preoperative HD were significantly more likely to develop postoperative hyperkalemia. The total complication rates were numerically higher in patients undergoing preoperative HD. After adjustment, the incidence of postoperative hyperkalemia or any other postoperative complication rates were similar between groups. There were no differences in long-term survival between the two groups.ConclusionsIt does not seem indispensable for PD patients to switch to temporary HD before major surgeries
- …